The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This chapter deals with the application of automatic speaker classification in human machine dialog systems based on telephone operation. In a first step we introduce a taxonomy based on three features that such systems might have. We explain the features, namely online, mirroring, critical and their respective counterparts get explicated and are than used to characterize a part of the exemplary applications...
Speaker classification in forensic phonetics and acoustics is relevant for several practical tasks within this discipline, including voice analysis, voice comparison, and voice lineup. Six domains of speaker characteristics commonly used in forensic speech analysis are addressed: dialect, foreign accent, sociolect, age, gender, and medical conditions. Focussing on gender plus the less-commonly used...
A new paradigm for forensic science has been encouraged in the last years, motivated by the recently reopened debate about the infallibility of some classical forensic disciplines and the controversy about the admissibility of evidence in courts. Standardization of procedures, proficiency testing, transparency in the scientific evaluation of the evidence and testability of the system and protocols...
Speaker classification is a fundamental component of speaker identification and verification (SIV) technologies. This paper provides and overview of the many guises that classification takes within SIV systems.
In this chapter, we give a brief introduction to speech-driven applications in order to motivate why it is desirable to automatically recognize particular speaker characteristics from speech. Starting from these applications, we derive what kind of characteristics might be useful. After categorizing relevant speaker characteristics, we describe in more detail language, accent, dialect, idiolect, and...
This paper investigates how speakers can be classified into native and non-native speakers of a language on the basis of acoustic and perceptually relevant features in their speech. It describes some of the most salient acoustic properties of foreign accent, based on a comparative corpus analysis of native and non-native German and English. These properties include the durational features vowel reduction,...
Information about the age of the speaker is always present in speech. It is used as perceptual cues to age by human listeners, and can be measured acoustically and used by automatic age estimators. This chapter offers an introduction to the phonetic study of speaker age, with focus on what is known about the acoustic features which vary with age. The age-related acoustic variation in temporal as well...
In this chapter, we consider a range of issues associated with analysis, modeling, and recognition of speech under stress. We start by defining stress, what could be perceived as stress, and how it affects the speech production system. In the discussion that follows, we explore how individuals differ in their perception of stress, and hence understand the cues associated with perceiving stress. Having...
In this paper, we address the — interrelated — problems of speaker characteristics (personalization) and suboptimal performance of emotion classification in state-of-the-art modules from two different points of view: first, we focus on a specific phenomenon (irregular phonation or laryngealization) and argue that its inherent multi-functionality and speaker-dependency makes its use as feature in emotion...
This chapter focuses on the detection of emotion in speech and the impact that using technology to automate emotion detection would have within the legal system. The current states of the art for studies of perception and acoustics are described, and a number of implications for legal contexts are provided. We discuss, inter alia, assessment of emotion in others, witness credibility, forensic investigation,...
The annual NIST Speaker Recognition Evaluations (SREs) from 1996 to 2006 have been internationally recognized as the leading source or performance evaluation of research systems in the speaker classification field. We discuss how these evaluations have developed and been conducted and the performance measures used. We consider the key factors that have been studied for their effect on performance,...
In the evaluation of speaker recognition systems—an important part of speaker classification [1], the trade-off between missed speakers and false alarms has always been an important diagnostic tool. NIST has defined the task of speaker detection with the associated Detection Cost Function (DCF) to evaluate performance, and introduced the DET-plot [2] as a diagnostic tool. Since the first evaluation...
As well as conveying a message in words and sounds, the speech signal carries information about the speaker’s own anatomy, physiology, linguistic experience and mental state. These speaker characteristics are found in speech at all levels of description: from the spectral information in the sounds to the choice of words and utterances themselves. This chapter presents an introduction to speech production...
Speaker classification requires a sufficiently accurate functional description of speaker attributes and the resources used in speaking, to be able to produce new utterances mimicking the speaker’s current physical, emotional and cognitive state, with the correct dialect, social class markers and speech habits. We lack adequate functional knowledge of why and how speakers produce the utterances they...
In this chapter we will discuss feature extraction methods for speaker classification. We introduce linear predictive coding, mel frequency cepstral coefficients and wavelets and perform experimental studies on AURORA and TIMIT data. For the speaker identification task, we can show that wavelets are beneficial.
Higher-level features based on linguistic or long-range information have attracted significant attention in automatic speaker recognition. This article briefly summarizes approaches to using higher-level features for text-independent speaker verification over the last decade. To clarify how each approach uses higher-level information, features are described in terms of their type, temporal span, and...
This chapter describes a method for enhancing the differences between speaker classes at the feature level (feature enhancement) in an automatic speaker recognition system. The original Mel-frequency cepstral coefficient (MFCC) space is projected onto a new feature space by a neural network trained on a subset of speakers which is representative for the whole target population. The new feature space...
Automatic speaker recognition systems have a foundation built on ideas and techniques from the areas of speech science for speaker characterization, pattern recognition and engineering. In this chapter we provide an overview of the features, models, and classifiers derived from these areas that are the basis for modern automatic speaker recognition systems. We describe the components of state-of-the-art...
Accurate detection of speaker traits has clear benefits in improving speech interfaces, finding useful information in multi-media archives, and in medical applications. Humans infer a variety of traits, robustly and effortlessly, from available sources of information, which may include vision and gestures in addition to voice. This paper examines techniques for integrating information from multiple...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.